CV - Module Project Part-1 Seedling Species Classifier

  • DOMAIN: Botanical Research

  • CONTEXT:
University X is currently undergoing some research involving understanding the characteristics of plant and plant seedlings at various stages of growth. They already have have invested on curating sample images. They require an automation which can create a classifier capable of determining a plant's species from a photo

  • DATA DESCRIPTION:
The dataset comprises of images from 12 plant species. [Source]

  • PROJECT OBJECTIVE:

University’s management require an automation which can create a classifier capable of determining a plant's species from a photo

  • Import the data.

  • Visualize

  • We can see that the dataset is comprised of rgb images of 12 different species of different shapes; We will have to reshape them to a common size to be able to use in Models

  • Finalizing the preprocessing on images

  • The models should be able to learn much better from the preprocessed images as all the noise is stripped out and teh shapes have become immediately apparent and distinct from each other

  • Resizing all images to 256x256

  • Split the dataset into Train-Val-Test Cuts

  • Train Supervised Algorithms

  • Multinomial Naive Bayes Classifier

  • SGD "Classifier

  • KNN Classifier

  • Neural Network

  • Convolutional Neural Network

  • Use Data Augmentation

  • We can see that the model was able to predict most species correctly but makes most mistakes on Black-grass and Loose Silky-bent. This might be because both are actually types of grass and the model is unable to disambaguate the patterns for these. Some more data collection/augmentation for these classes or using additional features or a separate model for these might help boost the final accuracy further.

  • Final Accuracies:

Model Training Accuracy Testing Accuracy
Naive Bayes Classifier 30.86% 22.15%
SGD Classifier 95.80% 39.26%
KNN Classifier 58.00% 46.81%
Neural Net 66.87% 60.06%
CNN 87.83% 69.39%
CNN + Data Augmentation 81.27% 80.50%

  • Pickle the best model for future use: Clearly, The final CNN Model is the best model and was able to acheive a n out-of-sample testing accuracy of 80.5%.